Suppose you work in a building that has a fire alarm system. The fire alarm is designed to go off when there is a fire, and it’s also known that sometimes the alarm can go off due to smoke from a malfunctioning HVAC system.
there is a 1% chance that there is a fire: \(P(Fire) = 0.01\)
the alarm system works pretty well and there is a 95% chance it goes off when there is an actual fire
\(P(\text{Alarm goes off | Fire}) = 0.95\)
there is a 10% chance that the alarm goes off due to smoke without a fire
\(P(\text{Alarm goes off | No Fire}) = 0.1\)
what’s the probability of actually there being a dangerous fire given that alarm goes off?
Basic Concepts
Independence
Intuition: Information about the outcome of event A doesn’t change the probability of event B happening
Two events \(A\) and \(B\) are independent if
\[
P(A \cap B)=P(A)P(B)
\]
we can deduce that if \(A\) and \(B\) are independent, then
\(P(A) = P(A|B)\)
\(P(B) = P(B|A)\)
When there are more than two events, we say that they are mutually independent if every subset of the events is independent
Reminder:
pairwise independence does not imply mutual independence
Basic Concepts
random variable
A random variable\(X(\omega)\) is a function of the underlying outcome \(\omega \in \Omega\)
\(X(\omega)\) has a probability distribution that is induced by the underlying probability measure \(P\) and the function \(X(\omega)\):
\[
\textrm{Prob} (X \in A ) = \int_{\mathcal{G}} P(\omega) d \omega
\]
\(\qquad\) where \({\mathcal G}\) is the subset of \(\Omega\) for which \(X(\omega) \in A\)
Probability Distributions
A probability distribution \(\textrm{Prob} (X \in A)\) can be described by its cumulative distribution function (CDF)
\[
F_{X}(x) = \textrm{Prob}\{X\leq x\}.
\]
A continuous-valued random variable can be described by density function\(f(x)\) that is related to its CDF by
the number of possible values of \(X\) is finite or countably infinite
we replace a density with a probability mass function (pmf), a non-negative sequence that sums to one
we replace integration with summation in the formula that relates a CDF to a probability mass function (pmf)
let us discuss some common distributions for illustrations
Common distributions
Discrete distributions
A discrete distribution is defined by a set of numbers \(S = \{x_1, \ldots, x_n\}\) and a probability mass function (pmf) on \(S\), which is a function \(p\) from \(S\) to \([0,1]\) with the property
\[
\sum_{i=1}^n p(x_i) = 1
\]
a random variable \(X\)has distribution\(p\) if \(X\) takes value \(x_i\) with probability \(p(x_i)\)
The Poisson distribution on \(X = \{0, 1, \ldots\}\) with parameter \(\lambda > 0\) has pmf
\[
p(x) = \frac{\lambda^x}{x!} e^{-\lambda}
\]
The interpretation of \(p(x)\) is: the probability of \(i\) events in a fixed time interval, where the events occur independently at a constant rate \(\lambda\).
the mean is \(\lambda\) and the variance is \(\lambda\)
Common distributions
Normal distribution
the most famous distribution is the normal distribution, which has density
it has two parameters, \(\mu \in \mathbb R\) and \(\sigma \in (0, \infty)\)
the mean is \(\mu\) and the variance is \(\sigma^2\)
Common distributions
Continuous distributions
A continuous distribution is represented by a probability density function (pdf), which is a function \(p\) over \(\mathbb R\) such that \(p(x) \geq 0\) for all \(x\) and
\[
\int_{-\infty}^\infty p(x) dx = 1
\]
We say that random variable \(X\) has distribution \(p\) if
\[
\mathbb P\{a < X < b\} = \int_a^b p(x) dx
\]
for all \(a \leq b\)
Common distributions
Lognormal distribution
The lognormal distribution is a distribution on \(\left(0, \infty\right)\) with density
\[
p(x) = \frac{1}{\sigma x \sqrt{2\pi}}
\exp \left(- \frac{\left(\log x - \mu\right)^2}{2 \sigma^2} \right)
\]
It has two parameters, \(\mu\) and \(\sigma\)
the mean is \(\exp\left(\mu + \sigma^2/2\right)\)
the variance is \(\left[\exp\left(\sigma^2\right) - 1\right] \exp\left(2\mu + \sigma^2\right)\)
Common distributions
Gamma distribution
The gamma distribution is a distribution on \(\left(0, \infty\right)\) with density
Suppose we have an observed distribution with values \(\{x_1, \ldots, x_n\}\)
The sample mean of this distribution is defined as
\[
\bar x = \frac{1}{n} \sum_{i=1}^n x_i
\]
The sample variance is defined as
\[
\frac{1}{n} \sum_{i=1}^n (x_i - \bar x)^2
\]
LLN and CLT
two of the most important results in probability and statistics
the law of large numbers (LLN)
the central limit theorem (CLT)
Let \(X_1, .\dots, X_n\) be independent and identically distributed scalar random variables, with common distribution \(F\) and common mean \(\mu\) and variance \(\sigma^2\)
Law of large numbers
\[
P(|\bar{X}_n- \mu | \geq \varepsilon) \rightarrow 0 \text{ as } n \rightarrow \infty, \quad \forall \varepsilon > 0
\]
Central limit theorem
\[
\sqrt{n}(\bar{X}_n- \mu) \stackrel{d}{\to} N(0, \sigma^2)\quad \text{ as } \quad n \to \infty
\]
References
Stachurski, John. 2016. A Primer in Econometric Theory. Cambridge, Massachusetts: The MIT Press.